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Four  groups  of  college  undergraduates  took  a multiple-choice  computer- 
managed  test.  Three  of  these  groups  received  informative  feedback  (the  entire 
item  with  the  correct  answer  identified)  either  (1)  immediately  item-by-item 
(2-8econd  delay),  (2)  following  the  entire  test  (20-minute  delay),  or  (3)  24 
hours  later  (24-hour  delay).  The  fourth  (control)  group  received  no  feedback. 
Scores  on  a criterion  test,  given  1 to  3 weeks  later,  showed  that  retention 
was  significantly  better  for  the  two  delayed  feedback  groups  (20-mlnute  and 
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N,' 24-hour  delay)  than  for  the  immediate  feedback  group  (2-8econd  delay).  These 
results  confirmed  previous  findings  of  laboratory  experiments — that  retention 
following  delayed  feedback  Is  not  degraded  by  the  delay.  .V 


FOREWORD 


/This  research  and  development  was  sponsored  by  the  Advanced  Research 
Projects  Agency  under  ARPA  Order  No.  3181,  and  is  part  of  the  ARPA  training 
technology  program.  A principal  objective  of  this  program  is  the  development 
of  computer-based  training  technology  for  DoD-wide  application. 

This  study  is  the  first  in  a series  investigating  the  relative  effective- 
ness of  providing  immediate  or  delayed  feedback  for  answers  on  computer- 
managed  tests.  The  results  provide  important  background  information  for 
potential  application  in  Navy  training  whenever  computer -managed  testing  is 
utilized.  In  contrast  to  the  commonly  held  view  that  immediate  feedback 
is  essential  to  promote  learning  and  retention,  the  results  of  this  practical 
study  confirmed  the  findings  of  prior  laboratory  studies — that  retention 
following  delayed  feedback  is  not  degraded  by  the  delay. 

Dr.  William  E.  Montague  served  as  the  technical  contract  monitor. 


J.  J.  CLARKIN 
Commanding  Officer 
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Probloi 

Most  programs  for  computer-managed  and  computer-assisted  Instruction  are 
based  on  the  principle  that  providing  immediate  informative  feedback — that  is, 
information  about  the  correctness  of  the  student's  response — is  essential  if 
learning  is  to  occur.  Homver,  findings  of  several  experimental  studies  raise 
doubts  as  to  whether  it  is  necessary  to  provide  Immediate  feedback  to 
promote  better  retention  or  whether  delays  in  informative  feedback  (i.e., 

»»hen  there  is  an  interval  between  the  student's  response  and  the  presenta- 
tion of  feedback,  can  be  tolerated.  Since  it  is  extremely  expensive  to 
provide  for  immediate  feedback  in  the  design  of  computer— managed  and 
coiq>uter-asslsted  Instructional  systems,  sound  data  are  needed  to  determine 
the  relative  effectiveness  of  the  two  methods. 

Purpose 

The  purpose  of  the  present  experiment  was  to  determine  whether  the  findings 
of  short  laboratory  studies  conducted  previously  would  generalise  to  coi^uter- 
managed  testing  in  a college  course.  The  specific  purpose  was  to  determine 
whether  delivering  feedback  immediately  or  after  a delay  interval  differentially 
affected  later  retention. 

Approach 

As  part  of  an  undergraduate  course,  four  groups  of  students  were  administered 
a computer-managed  test  comprised  of  30  multiple-choice  items.  Three  of  the 
groups  received  informative  feedback:  the  ilrst,  item-by-item,  iMediatcly 
after  students  had  completed  each  item  (2-second  delay);  the  second,  ibnedlately 
after  they  had  completed  the  entire  set  of  items  (20-mlnute  delay);  and  the 
third,  24  hours  later  (24— hour  delay).  The  fourth  (control)  group  received  no 
feedback.  From  1 to  3 weeks  later,  a criterion  test  over  the  same  material  was 
administered  to  all  groups.  The  criterion  test  Included  47  multiple-choice 
items  (30  that  were  the  same  as  those  in  the  initial  test,  and  17  others  on  the 
material)  and  10  short-answer  items  (five  that  were  adapted  from  multiple- 
choice  items  in  the  initial  test,  and  five  others  on  the  same  material).  For 
multiple-choice  items  in  both  tests,  students  recorded  their  chosen  alternative 
and  their  confidence  in  the  correctness  of  that  choice.  Also,  an  anxiety 
scale  was  administered  to  all  groups  before  and  after  they  took  the  coiqmcer- 
menaged  test,  when  they  returned  to  the  testing  room  24  hours  later  to 
*f***^'v*  their  test  score,  and  after  they  took  the  criterion  test;  and  to 
the  feedback  groups  Immediately  after  they  had  received  and  studied  their 
feedback. 

Analyses  were  conducted  to  determine  whether  there  were  any  differences 
between  the  feedback  groups  in  (1)  performance  on  multlpls-cboles  itsM  included 
in  both  tests,  multiple-choice  items  Included  in  the  criterion  test  only,  and 
short-snswer  "same"  and  "different"  items;  (2)  time  required  to  answer 
the  items  and  to  study  informative  feedback;  and  (3)  anxiety  experienced 
before  and  after  testing. 
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Findings 


1.  Analyses  of  performance  on  multiple— choice  items  tliat  were  included 
in  both  tests  showed  that  feedback  conditions  had  a significant  effect  on 
several  measures: 

a.  The  overall  mean  correct  for  the  immediate  feedback  condition 
(2-aecond  delay)  and  the  two  delayed  feedback  conditions  (20-minute  and  24- 
hour  delay)  combined  ms  significantly  greater  than  that  for  the  no  feedback 
(control)  condition. 

b.  The  mean  correct  for  the  two  delayed  feedback  conditions  combined 
was  significantly  greater  than  that  for  the  inanediate  feedback  condition. 

These  results  can  be  attributed  to  the  feedback  effects  for  Items  that  were 
Incorrect  on  the  first  testa;  that  is,  for  the  proportion  of  items  that  were 
wrong  on  the  first  test  and  right  on  the  second,  reliable  effects  of  feedback 
conditions  were  found,  while  for  items  that  were  correct  on  both  tests,  no 
effects  were  found. 

c.  Feedback  conditions  also  had  a significant  effect  on  the  amount  of 
change  in  confidence  ratings  for  items  tliat  were  wrong  on  the  initial  test  and 
right  on  the  second.  The  24-hour  delay  group  had  a significantly  greater  change 
in  confidence  ratings  than  any  of  the  other  groups;  and  the  20-mlnute  delay 
group,  a significantly  greater  change  than  the  2-second  Immediate  feedback  group. 

d.  When  items  were  categorised  in  terms  of  their  difficulty,  feedback 
conditions  had  a significant  effect  only  for  the  most  difficult  ones.  The 
relationships  among  feedback  conditions  were  the  same  as  those  reported  in  b 
above. 

2.  Feedback  conditions  had  no  significant  effect  on  any  of  the  other 
measures. 

Conclusions 

A consistent  finding  of  previous  laboratory  experiments— chat  long-term 
retention  following  Immediate  feedback  is  not  superior  to  that  with  delayed 
feedback — has  been  confirmed  with  computer-managed  tests  in  an  educational 
setting.  In  fact,  some  delay  in  presentation  of  feedback  results  in  superior 
retention;  and  the  longer  24-hour  delay  has  a greater  effect  on  the  change  In 
student's  confidence  in  their  answers. 

Recommendat ions 

1.  Further  research  should  be  conducted  to  extend  the  findings  of  the 
present  study  by  comparing  the  relative  effects  of  immediate  and  delayed  feed- 
back under  other  experimental  conditions  (e.g.,  using  different  forms  of  feed- 
back presentation  and/or  criterion  test  items  and  conducting  repeated  eoiq!uter- 
managed  tests  with  informative  feedback  throughout  a course) • 

2.  It  is  assumed  that  these  results  are  due  to  an  Increase  in  student  con- 
centration  on  feedback  that  Influences  the  level  or  breadth  of  processing  of 
the  remembered  information  and  the  feedback.  Therefore,  procedures  that  foster 
the  breadth  of  processing  should  be  developed  and  evaluated.  • 
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3.  The  appropriateness  of  requiring  iowdiate  feedback  to  be  provided 
in  curriculum  development  (NAVEDTRA  106A,  1975)  should  be  reconsidered,  since 
such  conditions  may  not  be  optimm  for  learning. 
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INTRODUCTION 


Problem 

Most  programs  for  programmed,  computer-managed,  and  computer-assisted 
instruction  are  based  on  the  principle  that  providing  immed iate  informative 
feedback — that  is,  immediate  information  to  the  student  about  the  correct- 
ness of  his  or  her  response  to  questions — is  essential  to  promote  learning 
and  retention.  Howevar,  findings  of  several  laboratory  experimental  studies — 
that  retention  following  delayed  feedback  (i.e.,  when  there  is  an  interval 
between  the  student's  response  and  the  presentation  of  feedback)  is  not  de- 
graded by  the  delay — raise  doubts  as  to  the  validity  of  that  principle. 

Since  it  is  extremely  laborious  and  expensive  to  provide  for  immediate  feed- 
back in  the  design  of  computer-managed  and  computer-assisted  instructional 
systems,  sound  data  are  needed  to  determine  whether  these  laboratory  results 
generalize  to  actual  course  conditions. 

Background 

An  extensive  experimental  literature  exists  that  is  concerned  with  cor- 
rective feedback  and  its  effects  on  student  learning  and  retention  (Kulhavy, 

1976).  Concern  for  the  effects  of  the  delay  of  feedback  resulted  from:  (1) 
the  idea  that  since  feedback  was  a form  of  reinforcement,  it  should  function 
as  food  does  in  shaping  the  behavior  of  a hungry  organism,  and  (2)  the  findings 
that  delays  in  food  reinforcement  were  often  disruptive  to  performance. 

Therefore,  if  feedback  reinforces  correct  responses,  any  delay  should  produce 
poorer  learning  and  memory.  The  experimental  studies  in  this  area  were  con- 
ducted mostly  in  laboratory  situations  using  a wide  variety  of  learning  tasks, 
ranging  from  simple  discrimination  studies  for  children  (e.g.,  when  learning 
to  choose  between  geometric  shapes)  to  segments  of  actual  lesson  materials 
for  children  and  adults.  Since  a variety  of  procedures  were  used  in  these 
studies,  terminological  confusions  arise  in  trying  to  summarize  them  and  to 
extrapolate  their  results. 

Delays  in  giving  feedback  about  the  correctness  of  answers  to  each  test 
question  can  be  introduced  in  various  ways.  For  example,  "immediate"  feed- 
back may  be  provided  within  a second  or  two  after  a student  makes  a response, 
while  "delayed"  feedback  may  be  provided  10  or  20  seconds  after  the  response. 

In  other  instances,  the  student  might  have  to  answer  all  questions  on  a test 
before  any  informative  feedback  is  given.  Thus,  even  if  feedback  is  given 
"immediately"  after  the  entire  test,  the  specific  feedback  for  the  initial  | 

question  is  actually  delayed  by  the  length  of  time  the  student  needs  to  J 

answer  the  remaining  questions  and  to  receive  feedback  about  all  answers. 

The  effect  of  any  feedback  manipulation  is  evaluated  by  giving  another  • 

test,  a retention/criterion  test,  either  a short  time  (seconds  or  minutes),  or  j 

a longer  time  (usually  a day  or  more)  after  feedback  treatment  is  completed.  | 

The  findings  of  studies  providing  short  post-item-answer  delays  of  less  than 
a minute,  and  those  providing  longer,  post-test  delays  on  short  retention 
Intervals  are  consistent:  Only  occasionally  does  delay  produce  any  detri- 
mental effect  on  short-retention-interval  test  performance;  in  those  cases, 

! 
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It  usually  involves  the  learning  of  simpler  tasks.  Most  studies  reveal  no 
significant  effect  of  delay.  However,  when  retention  tests  are  delayed  I 
to  7 days,  performance  of  subjects  receiving  feedback  after  a delay,  is 
often  supe'ior  to  that  of  subjects  receiving  feedback  immediately.  It 
is  important  to  note  that  no  study  found  that  long-term  retention  following 
immediate  feedback  was  superior  to  that  following  delayed  feedback. 

Since  the  studies  using  simple  learning  tasks  (e.g.,  Brackbill,  1964) 
differ  in  procedures  and  purpose  from  those  concerned  with  tasks  more  relevant 
to  instructional  problems,  for  purposes  of  this  report  they  will  be  ignored 
in  favor  of  the  more  relevant  studies.  In  the  period  since  1960,  11  experi- 
ments have  been  reported,  all  of  which  used  academic-type  materials  and  com- 
pared immediate  feedback  with  intervals  ranging  from  2 seconds  to  20  minutes 
with  delayed  feedback  with  intervals  ranging  from  24  to  48  hours.  Thus,  the 
following  discussion  will  be  limited  to  those  13  experiments,  which  were 
reported  by  English  and  Kinzer  (1966);  Kulhavy  and  Anderson  (1972);  More 
(1969);  Newman,  Williams,  and  Hiller  (1974);  Phye  and  Bailer  (1970);  Sassenrath 
and  Yonge  (1968);  Sturges  (1969  and  1972 — Experiments  1 and  2);  Sturges  and 
Crawford  (1964 — Experiments  1,  2,  and  4);  and  Surber  and  Anderson  (1975). 

Delay  Retention  Effect 

In  11  of  these  experiments,  all  but  those  reported  by  Newman  et  al. 
and  Sturgis  and  Crawford  (Experiment  4),  it  was  found  that  retention  following 
delayed  feedback  was  superior  to  that  following  immediate  feedback.  This 
phenomenon  has  been  called  the  "delay  retention  effect."  Newman  et  al.  had 
no  evidence  that  students  had  learned  from  informative  feedback,  since  the 
performance  of  the  two  feedback  groups  did  not  differ  from  that  of  a control 
group  receiving  no  feedback.  The  feedback  provided  by  Sturges  and  Crawford 
in  their  fourth  experiment  consisted  of  giving  all  alternatives,  along  with 
a cue  directing  the  student  to  the  correct  answer. 

The  studies  in  which  the  delay  retention  effect  occurred  were  identical 
in  regard  to  the  following  conditions: 

1.  As  indicated  previously,  the  learning  task  concerned  academic-type 
material . 

2.  Initial  test  items  and  the  Informative  feedback  were  in  a multiple- 
choice  format. 

3.  Informative  feedback  was  presented  only  once. 

4.  The  items  in  the  initial  and  retention  tests  were  identical. 

These  studies  also  had  a number  of  varied  conditions: 

1.  In  some  studies,  the  initial  test  and  presentation  of  feedback 
represented  the  student's  first  exposure  to  the  learning  material;  in  others, 
the  student  had  studied  the  material  before  being  tested. 

2.  In  some  studies,  students  were  given  a retention  test  Immediately 
after  receiving  feedback  and  a long-term  retention  test  later;  in  others, 
they  were  given  only  a long-term  retention  test. 
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3.  In  some  studies.  Immediate  feedback  referred  to  item-by-item 
presentation  immediately  after  the  student  made  each  response  (approximately 
2-second  Interval);  in  others,  it  referred  to  feedback  presented  after  the 
student  had  responded  to  the  entire  set  of  test  questions  (20-minute  interval). 
For  the  purposes  of  this  report,  the  latter  (20-mlnute  Interval)  will  be  con- 
sidered delayed  feedback. 

4.  In  one  study  (Phye  & Bailer,  1970),  feedback  was  presented 
auditorily;  in  the  rest,  it  was  presented  in  printed  form. 

5.  In  one  study  (Sturges,  1972),  retention  was  measured  by  a recall 
test  as  well  as  a multiple-choice  retention  test;  in  the  rest,  no  recall  test 
was  included. 

Variables  Affecting  Delay  Retention  Effect 

The  delay  retention  effect  is  affected  by: 

1.  The  form  of  informative  feedback  or  the  amount  of  information  or 
stimulus  aspects  presented  at  feedback. 

2.  The  accuracy  of  the  student's  initial  response  to  an  item. 

In  addition,  a variable  of  potential  importance  is  the  amount  of 
anxiety  the  student  is  experiencing  when  he  receives  feedback. 

These  variables  are  discussed  in  the  following  paragraphs. 

Form  of  Informative  Feedback.  Sturges  (1969)  and  Phye  and  Bailer  (1970) 
found  that  the  delay  retention  effect  occurred  when  feedback  consisted  of  in- 
correct alternatives  as  well  as  the  correct  alternative  but  not  when  it  con- 
sisted of  only  the  correct  alternative.  Sturges  (1972)  extended  this  finding, 
when  she  found  that  the  delay  retention  effect  occurred  with  this  form  of  in- 
formation feedback  only  when  the  incorrect  alternatives  were  relevant  to  the 
material  in  the  retention  test.  The  feedback  provided  by  Sassenrath  and  Yonge 
(1968)  also  consisted  of  correct  and  incorrect  alternatives;  however,  in  some 
cases,  it  included  the  stem,  while  in  others  it  did  not.  The  delay  retention 
effect  occurred  under  both  conditions. 

Sturges  (1972)  also  compared  the  effect  of  providing  delayed  feedback 
for  six  different  types  of  feedback,  including  that  used  by  Sturges  and  Crawford 
(1964)  in  their  fourth  experiment:  all  alternatives  along  with  a cue  directing 
the  student  to  the  correct  answer.  As  indicated  previously,  using  this  method, 
retention  after  immediate  feedback  was  equal  to  that  with  delayed  feedback. 

On  the  basis  of  this  finding,  Sturges  concluded  information  included  in  feed- 
back determines  how  students  respond  to  and  thus  learn  from  that  feedback. 

She  also  concluded  that  the  student's  reaction  to  and  amount  learned  from  that 
feedback  is  affected  by  the  length  of  the  delay  interval.  That  is,  when  im- 
mediate feedback  is  presented,  the  student  can  determine  the  correctness  of 
his  initial  response  by  merely  checking  to  see  whether  the  number  of  that 
response  agrees  with  the  correct  alternative.  On  the  other  hand,  with  delayed 
feedback,  he  must  read  the  item  and  study  all  of  the  information  included  in 
feedback  to  determine  whether  his  answer  was  correct.  Retention  test  per- 
formance will  be  improved,  then,  when  the  student  is  required  (at  feedback)  to 
recall  and  "think  about"  the  information  relevant  to  the  requirements  of  the 
retention  test. 
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I Accuriu-y  of  Student’s  Initial  Kusix)»sc.  Ktilliavy  and  Anderson  (1972) 

suggested  a "perseveration-interference"  hypothesis,  which  nviy  explain  the 
delay  retention  effect.  According  to  this  hypothesis,  for  a time  after  making 
a response,  the  memory  for  the  response  "perseverates"  temporarily.  Thus, 

I if  the  response  is  incorrect,  this  perseveration  interferes  with  correcting 

I the  response  when  feedback  is  immediate.  With  delayed  feedback,  persevera- 

! tion  is  over  and,  thus,  there  is  no  interference.  Surber  and  Anderson  (1975) 


gave  support  to  this  hypothesis,  when  they  found  that  delayed  feedback  is  more 
effective  for  items  that  were  initially  incorrect. 


Anxiety  Experienced  at  Feedback.  According  to  the  Drive  Theory  (Spence, 
1958;  Taylor,  1956),  subjects  experiencing  a high  degree  of  anxiety  do  not 
perform  as  well  as  those  with  a lower  degree  of  anxiety  on  complex  or  dif- 
ficult learning  tasks  in  which  competing  error  tendencies  are  stronger  than 
the  tendency  to  select  the  correct  choice.  An  example  of  a difficult  item 
having  strong  competing  tendencies  is  one  where,  initially,  the  student's 
tendency  to  select  an  incorrect  choice  is  stronger  than  his  tendency  to  select 
the  correct  choice,  or  one  where,  initially,  the  student's  tendency  to  select 
the  correct  choice  was  about  equal  to  his  tendency  to  select  one  or  more  in- 
correct alternatives.  Based  on  this  theory,  the  greater  the  anxiety  at  the 
time  of  feedback,  the  greater  the  strength  of  all  the  student's  response 
tendencies  and  the  greater  the  competition  among  the  correct  and  incorrect 
alternatives.  Thus,  if  the  student  has  more  anxiety  at  the  time  of  immediate 
feedback  than  he  does  at  the  time  of  delayed  feedback,  it  follows  that  his 
retention  after  delayed  feedback  will  be  superior  to  that  after  immediate 
feedback. 


According  to  the  Trait-State  Anxiety  Theory  (Spielberger , 1966,  1971), 
it  is  essential  to  distinguish  between  "state"  anxiety  and  "trait"  anxiety. 

State  anxiety  refers  to  a complex  response  condition  that  varies  in  intensity 
and  fluctuates  over  time;  it  is  characterized  by  feelings  of  tension  and  ap- 
prehension and  by  activation  of  the  autonomic  nervous  system.  Trait  anxiety 
refers  to  relatively  stable  individual  differences  in  anxiety  proneness. 

Several  recent  studies  of  computer-assisted  instruction  have  examined 
state  anxiety  and  have  supported  the  contention  that  periodic  state  anxiety 
measures  can  be  used  to  Investigate  the  relationship  between  anxiety  and  per- 
formance (Leherissey,  O'Neil,  & Hansen,  1971;  O'Neil,  1972;  O'Neil,  Spellberger, 
& Hansen,  1969;  Leherissey,  O'Neil,  Heinrich,  & Hansen,  1973).  These  studies 
used  the  20-item  State  Anxiety  Scale  from  the  State-Trait  Anxiety  Inventory 
(Spielberger,  Gorsuch,  & Lushene,  1970)  to  measure  state  anxiety  of  students 
when  learning  materials  presented  via  computer-assisted  Instruction.  Results 
showed  that  higher  levels  of  state  anxiety  were  associated  with  the  more  dif- 
ficult learning  materials,  and  that  subjects  with  high  state  anxiety  made  more 
errors  in  these  materials.  These  findings  Indicate  that  the  State  Anxiety 
Scale  can  be  used  to  investigate  the  hypothesis  that  the  delay  retention 
effect  is  related  to  different  amounts  of  anxiety  experienced  at  immediate 
and  delayed  feedback. 


The  purpose  of  the  present  experiment  was  to  determine  whether  the  findings 
of  the  studies  described  above  were  applicable  to  computer-managed  testing  in 
a college  course.  The  specific  purpose  was  to  determine  whether  inmediate 
or  delayed  feedback  had  differential  effects  on  later  retention. 
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METHOD 


Experimental  Conditions 

The  experimental  conditions  of  this  experiment  were  made  as  similar  as 
possible  to  those  in  which  the  delay  retention  effect  occurred: 

1.  Students  were  given  a computer-managed  test  covering  regular  class 
material. 

2.  Both  the  computer-managed  test  and  the  retention  (criterion)  test 
were  in  a multiple-choice  format. 

3.  Informative  feedback  was  the  re-representation  of  each  item,  with  an 
indication  of  the  correct  alternative. 

4.  Four  feedback  conditions  were  provided:  one  immediate  feedback 
(2-second  delay),  two  delayed  feedback  (20-mlnute  and  24-hour  delay),  and 
one  no  feedback  (control).  In  previous  experiments,  feedback  with  a 
20-minute  delay  was  referred  to  as  "immediate,"  since  it  was  provided 
immediately  after  the  student  had  responded  to  an  entire  set  of  items. 
However,  since  this  procedure  produces  a considerable  delay  between  response 
to  an  item  and  its  feedback,  it  will  be  classed  as  delayed  feedback  in 

this  report. 

5.  Twenty-four  hours  after  taking  the  computer-managed  test,  students 
assigned  to  all  feedback  conditions  were  given  the  score  obtained  on  the 
test  (i.e.,  total  number  and  percent  correct). 

6.  Between  1 and  3 weeks  later,  students  were  tested  for  retention  of 
this  material. 

7.  For  both  the  computer-managed  test  and  the  criterion  test,  two 
additional  measures  were  used:  state  anxiety  and  confidence  ratings.  State 
anxiety  was  measured  to  determine  whether  there  was  a difference  among  the 
delayed  feedback  conditions  as  to  the  amount  of  anxiety  experienced  by  the 
student  when  feedback  was  presented.  Estimates  of  a subject's  confidence  in 
his  response  were  used  to  provide  a more  continuous  measure  of  feedback 
effectiveness  than  that  provided  by  percent  correct  on  the  tests. 

Subjects  and  Design 

The  112  students  in  four  sections  of  the  upper-division  course  in  Child 
Psychology  at  the  California  State  University  at  Chico  participated  in  the 
experiment.  The  course  was  taught  by  two  Instructors  (A  and  B) , each 
responsible  for  two  sections.  The  students  assigned  to  each  Instructor  were 
assigned  to  a randomized  block  design;  the  blocking  variable  was  the  total 
score  on  the  first  two  tests  in  the  course.^ 


*The  correlation  between  the  blocking  variable  and  the  score  on  the 
computer-managed  test  was  £ ■ .51  for  each  Instructor;  the  correlation 
between  the  blocking  variable  and  the  multiple-choice  items  that  were 
in  both  the  computer-managed  and  the  criterion  tests  was  r ■ .36  for  each 
instructor.  The  degree  of  these  relationships  justifies  the  use  of  a 
randomized  blocks  design  (see  Hayes,  1963,  p.  455). 
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within  the  blocks,  students  were  randomly  assigned  to  one  of  the  follow- 
ing feedback  conditions: 

1.  No  feedback  provided  (control  group). 

2.  Feedback  item-by-item,  about  2-seconds  after  the  student  makes  each 
response . 

3.  Feedback  with  about  a 20-Einute  delay  interval  (provided  after  the 
student  has  responded  to  an  entire  set  of  items). 

4.  Feedback  provided  after  a 24-hour  delay  interval. 

The  number  of  students  assigned  to  each  feedback  condition  is  shown  in  Table  1. 


Table  1 

Students  Assigned  to  Various  Feedback  Intervals 


Instruc-  None  Feedback  Intervals 


tor 

Block 

(Control) 

2-Second 

20-Mlnute 

24-Hour 

Total 

A 

I 

4 

4 

3 

3 

14 

II 

3 

3 

3 

4 

13 

III 

4 

4 

3 

3 

14 

IV 

3 

3 

4 

3 

13 

Total- 

— Instructor  A 

14 

14 

13 

13 

54 

B 

I 

4 

4 

4 

3 

15 

II 

2 

4 

3 

5 

14 

III 

5 

2 

4 

4 

15 

IV 

2 

4 

3 

5 

14 

Total- 

— Instructor  B 

13 

14 

14 

17 

58 

GRAND 

TOTAL 

27 

28 

27 

30 

112 

Both  Instructors  A and  B used  Blehler  (1976)  as  the  text,  and  both  required 
participation  in  the  computer-managed  and  criterion  tests  as  part  of  the  course. 
However,  the  way  the  two  Instructors  conducted  their  classes  differed  in  several 
respects  which  may  be  relevant  to  the  results: 

1.  Instructor  A did  not  Include  grades  obtained  in  the  two  tests  in  the 
course  grade;  Instructor  B counted  the  highest  score  obtained  In  the  two  tests 
as  part  of  the  final  grade. 
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2.  Instructor  A gave  five  exams  throughout  the  semester,  discarding  the 
one  with  tlie  lowest  grade.  Exams  were  on  the  book  and  lectures,  they  were 
not  discussed  in  class,  and  there  were  no  make-up  exams  until  the  final  week. 

3.  Instructor  B gave  seven  exams  throughout  the  semester,  on  the  text 
only.  After  completing  each  exam,  the  student  took  it  to  the  instructor 
to  be  graded  and  then  checked  the  graded  exam  with  a feedback  book,  which 
provided  an  explanation  for  each  answer.  If  desired,  the  student  could  take 
an  alternate  make-up  of  an  exam  on  any  exam  day,  once  each  week.  The  final 
course  grade  was  the  average  of  the  highest  scores  obtained  on  six  exams. 

Testing  Materials 

Computer-Managed  Test 

The  computer-managed  test  consisted  of  30  multiple-choice  (four 
alternatives)  items  (Form  A,  Quiz  6 — Biehler,  1976).  Using  Kuder-Rlchardson 
Formula  20,  the  reliability  of  this  test  was  r^  = .75.  Informative  feedback 
consisted  of  the  re-representation  of  the  item,  including  the  stem  and  all 
four  alternatives,  with  a statement  indicating  the  correct  alternative  (e.g., 
"Alternative  C is  correct"). 

This  test  was  administered  using  an  interactive  test  program  of  the 
SOCRATES  system  on  a Digital  Equipment  Corporation  model  PDF  11/45.  The 
system,  along  with  four  Teleray  Corporation  cathode  ray  tube  (CRT)  terminals 
and  a shared  hard-copy  printer,  was  housed  in  the  Education  Psychology  Build- 
ing. 


Criterion  Test 


The  criterion  test  consisted  of  two  parts.  The  first  part  consisted 
of  47  multiple-choice  (four  alternatives)  items,  including  the  30  that  were 
in  the  computer-managed  test  and  17  additional  items  on  the  same  material 
(Form  B,  Quiz  6 — Biehler,  1976).  The  items  were  presented  In  a random  order 
in  a dittoed  test  format.  For  the  30  items  that  were  in  both  tests,  the 
alternatives  were  in  the  same  order  on  both  tests  and  on  the  feedback  pre- 
sentation. 

The  second  part  consisted  of  10  questions  requiring  a word  or  short 
phrase  as  an  answer.  Five  of  these  questions  were  adapted  from  selected 
items  in  the  computer-managed  test;  that  is,  they  consisted  of  the  stem  plus 
the  correct  alternative,  but  a phrase  or  word  was  missing.  Five  were  dif- 
ferent items  over  the  same  material. 

Add itlonal  Measures 


For  both  tests,  two  additional  measures  were  used — state  anxiety  and  con- 
fidence ratings.  These  measures  are  described  in  the  following  paragraphs. 
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State  Anxiety  Measures 


The  student’s  anxiety  was  measured  by  the  short  form  of  the  20-item 
State  Anxiety  Scale  of  the  State-Trait  Anxiety  Inventory  (Spielberger  et  al., 
1970).  This  form  consists  of  those  five  items  having  the  highest  correla- 
tions with  the  remaining  15  items  of  the  scale.  Instructions  for  completing 
this  scale  were  either  standard  (i.e.,  "Indicate  how  you  feel  right  now")  or 
retrospective  (i.e.,  "Indicate  how  you  felt  during  the  task  you  have  just 
finished").  As  indicated  previously,  the  purpose  of  this  scale  was  to 
determine  whether  there  was  a difference  among  the  feedback  conditions  as  to 
the  amount  of  anxiety  the  student  was  experiencing  at  the  presentation  of 
feedback. 

Confidence  Ratings 

Asking  subjects  to  indicate  their  degree  of  confidence  in  each  response 
made  provides  a more  continuous  measure  of  performance  than  correctness  or 
errors,  and  may  prove  more  sensitive  to  feedback  manipulations.  Bayes' 

Theorem  of  conditional  probability  states  that  the  ratio  of  the  conditional 
probabilities  of  two  events,  given  some  datum  (the  posterior  odds  ratio),  is 
equal  to  the  llkelilwod  ratio  of  that  datum  under  those  events  times  the  ratio 
of  the  unconditional  prior  probabilities  of  those  two  events; 


p (E^/D)  p (D  / E^)  p (E^) 
P (E2/D)  ^ p (D  / E^)  * p (E2) 


or : 


Odds  Ratio 


posterior 


Likelihood  Ratio  x Odds  Ratio  . 

prior 


To  illustrate,  if  schizophrenics  outnumber  depressives  2:1  (prior  odds  ratio) 
and  a particular  test  score  is  three  times  as  likely  for  a schizophrenic  than 
a depressive  (likelihood  ratio  = 3:1),  then  a new  individual  with  this  test 
score  is  very  probably  a schizophrenic  (posterior  odds  ratio  » 6:1). 


Traditionally  (Edwards,  Lindman,  & Savage,  1963),  the  logarithm  of 
this  expression  is  taken  to  assure  the  additivity  of  the  various  components: 


LOR 


posterior 


LLR  + LOR 


prior 


The  LLR  is  taken  to  represent  the  Impact  or  potency  of  a particular  datum; 
that  is,  a very  potent  datum  produces  a greater  change  in  probability  than  a 
less  potent  datum. 

For  the  present  experiment,  it  appeared  that  a subjective  probability 
of  the  "correctness"  of  an  answer  could  be  obtained  immediately  prior  to  any 
feedback  (a  prior  probability  of  being  correct),  as  well  as  during  the  recall 
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situation  following  feedback  (a  posterior  probability  of  being  correct). 

Use  of  Bayes'  Theorem  as  outlined  above,  then,  should  result  in  a measure  of 
the  Impact  or  potency  of  the  particular  feedback  interval  (a  likelihood 
ratio)  since: 


LLR 


LOR 


posterior 


- LOR 


prior 


Two  aspects  of  this  formulation  are  important.  First,  such  a pro- 
cedure should  produce  a continuous  measure  of  feedback  effectiveness  rather 
than  the  more  traditional  dichotomous  right-wrong.  Second,  since  the  LLR 
is  a "difference"  score,  individual  differences  between  students  in  response 
level  should  not  affect  this  ratio. 


To  obtain  this  measure,  for  the  multiple-choice  items  in  both  tests, 
the  student  was  requested  to  indicate  his  degree  of  confidence  as  to  the  cor- 
rectness of  his  response.  Instructions  for  completing  the  confidence  ratings 
presented  a 3-lnch  scale,  with  one  end-point  labelled  1,  meaning  "Guess";  and 
the  other,  9,  meaning  "Certain."  For  each  multiple-choice  item,  the  student 
would  first  choose  an  alternative  and  indicate  his  coofidence  in  that  choice 
by  typing  or  writing  a number  from  1 to  9. 


In  a four-alternative  situation,  the  probability  of  a correct  random 
response  was  .25;  thus,  a confidence  level  of  1_  (Guess)  for  either  a correct 
or  an  incorrect  response  was  set  at  £ = .25.  Since  it  was  assumed  that  the 
remaining  values,  up  to  a value  of  were  linear  with  correct  responses 
were  assessed  at  equal  Intervals  from  .25  to  .99;  and  Incorrect  responses,  at 
equal  intervals  from  .25  to  .01.  For  both  tests,  the  of  the  ratio  2J3^ 

was  taken  as  the  response  measure.  For  example,  a ^ confidence  estimate  was 
associated  with  £ = .25  and  £ = .75;  the  odds  ratio  was  .25/. 75  = .3333;  and 
the  log  odds  ratio  was  log^^Q  .3333  = 0.47712.  As  noted  above,  this  value  for 


the  computer-managed  test  is  an  LOR 


an  LOR 


posterior' 


prior 


and  that  for  the  criterion  test  is 


Procedure 


Computer-Managed  Test 

Each  student  reported  to  the  computer  testing  room  to  take  the  computer- 
managed  test  at  a time  scheduled  by  him  sometime  between  the  11th  and  13th 
weeks  of  the  semester.  At  this  session,  the  student  was:  (1)  given  the  short 
form  of  the  A-State  Scale  with  standard  instructions  (l.e.,  "Indicate  how  you 
feel  right  now"),  (2)  briefed  on  the  use  of  the  computer,  (3)  given  two  sample 
Items,  and  (4)  presented  tlie  test  by  the  computer,  one  item  nt  a time.  The 
student  responded  to  each  Item  by  typing  the  letter  of  his  chosen  alternative 
and  the  number  of  his  confidence  rating. 

Students  in  the  2-8econd  delay  interval  group  (N  « 28)  received  in- 
formative feedback  immediately  after  they  had  responded  to  each  item,  with 
instructions  to  study  the  feedback  but  not  to  write  anything.  Students  in 
the  20-mlnute  delay  interval  group  (N  “ 27)  received  feedback  on  all  30  items 
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in  a series  after  they  had  completed  the  test,  with  instructions  to  study 
the  feedback  but  not  to  write  anything.  Students  in  the  24-hour  delay  in- 
terval group  (N  = 30)  and  in  the  control  group  (N  = 27)  received  no  feed- 
back at  the  first  session. 

Students  in  all  groups  were  given  the  short  form  of  the  A-State  Scale 
immediately  after  they  had  completed  the  test.  However,  this  time,  the  in- 
structions were  retrospective  (i.e.,  "Indicate  how  you  felt  during  the  task 
you  have  just  finished").  Students  in  the  20-minute  delay  interval  group 
were  given  the  scale  again  after  they  had  received  and  studied  their  feedback. 

Twenty-four  hours  after  a student  had  taken  the  computer-managed  test, 
he  returned  to  the  computer  testing  room.  If  he  were  in  the  control,  2-second 
delay  interval,  or  20-second  delay  interval  group,  he  was  given  the  short  form 
of  the  A-State  Scale  with  standard  instructions,  checked  into  the  computer  and 
given  a report  of  his  total  score  on  the  quiz  (the  number  and  the  percentage 
correct),  and  then  dismissed.  However,  if  he  were  in  the  24-hour  delay  Interval 
group,  he  was: 

1.  Given  the  A-State  Scale  with  standard  Instructions. 

2.  Checked  into  the  computer  and  presented  with  informative  feedback 
on  the  entire  set  of  items  he  had  completed  24  hours  previously,  with  instruc- 
tions to  study  the  feedback  but  not  to  write  anything. 

3.  Given  the  A-State  Scale  again,  but  this  time  with  retrospective 
Instructions. 

4.  Checked  into  the  computer  again  to  receive  the  total  score  obtained 
on  the  test. 

5.  Dismissed. 

On  the  computer-managed  test,  students  had  complete  control  of  the 
time  that  (1)  each  question  was  exposed  both  before  and  after  they  had  recorded 
their  response  and  confidence  rating  and  (2)  informative  feedback  for  each  item 
was  exposed — up  to  1 minute,  when  feedback  was  removed  automatically.  However, 
the  testing  session  was  programmed  so  that  no  student  could  proceed  to  the  next 
item  or  to  the  next  task  until  he  had  responded  to  the  preceding  ones.  Also, 
he  could  not  change  an  answer  once  It  had  been  recorded  and  he  could  not  review 
previous  questions.  A research  assistant  was  present  at  all  times  during  both 
computer  sessions  to  check  each  student  Into  the  computer,  to  administer  the 
A-State  Scales,  and  to  proctor  the  tests. 

Criterion  Test 

The  criterion  test  was  given  during  regular  class  periods,  at  least  1 
week  and  no  longer  than  3 weeks  after  the  student  had  taken  the  computer-managed 
test.  The  part  consisting  of  10  short-answer  Items  was  given  and  collected 
before  the  part  containing  the  47  multiple-choice  Items  was  distributed.  For 
each  of  these  latter  items,  the  student  wrote  the  letter  of  his  chosen  alternative 
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and  the  number  of  his  confidence  rating.  After  both  tests  were  completed, 
the  A-State  Scale  with  retrospective  instructions  was  administered.  Finally, 
the  student  completed  a questionnaire  on  the  amount  of  studying  he  had  done 
before  and  after  the  computer-managed  test  and  his  reactions  to  taking  the 
test  on  the  computer. 

Analyses 

Analyses  were  conducted  to  determine  whether  there  were  any  differences 
between  the  feedback  groups  in  (1)  performance  on  multiple-choice  items 
Included  in  both  tests,  multiple-choice  items  Included  in  the  criterion 
test  only,  and  short-answer  "same"  and  "different"  items;  (2)  time  required 
to  answer  the  items  and  to  study  informative  feedback;  and  (3)  anxiety 
experienced  before  and  after  testing.  These  analyses  are  described  in 
detail  in  the  following  section. 


RESULTS 


Performance  on  Multiple-Choice  Items  Appearing  in  Both  Compute r-Managed  and 
Criterion  Tests 

Student  performance  on  the  30  multiple-choice  items  that  appeared  in  both 
the  computer-managed  and  criterion  tests  was  of  primary  interest.  Thus,  for 
these  items,  analyses  were  conducted  to  determine  the  effect  of  the  various 
feedback  conditions  on  the  following: 

1.  The  mean  number  correct  in  both  tests. 

2.  The  proportion  of  items  that  were  correct  on  the  criterion  test  which 
had  been  either  correct  or  incorrect  on  the  initial  test. 

3.  The  degree  of  change  in  confidence  ratings  from  the  computer-managed 
test  to  the  criterion  test. 

These  analyses  are  described  in  the  following  paragraphs. 


Mean  Number  Correct  in  Both  Tests 

Figure  1 shows  the  mean  number  of  "same"  multiple-choice  items  correct 
on  both  tests  for  all  feedback  conditions.  Initially,  separate  unequal  ii 
randomized  block  analyses  of  variance  (ANOVAs)  of  each  test  were  conducted,  in- 
dicating a significant  effect  of  blocks  on  both  tests  (Computer  test, 

£ (3,80)  « 11.37,  £ < .01  and  criterion  test,  £ (3,80)  » 4.57,  £ < .01).  How- 

ever, since  none  of  the  interactions  between  blocks  and  either  instructor  or 
feedback  conditions  was  significant,  an  overall  unequal  £ ANOVA  for  these  two 
test  measures  (computer  and  criterion  test  scores)  was  conducted  with  two 
between-group  variables,  instructor  and  feedback  conditions.  The  overall 
effect  of  instructor,  as  well  as  the  Interactions  among  instructor,  feedback, 
and  test  was  not  significant.  However,  there  yus  a significant  Increase  in 
the  mean  number  correct  from  the  computer-managed  test  to  the  criterion  test 
(F  (1,104)  - 48.56,  £<  .01). 

Analysis  of  the  simple  main  effects  of  feedback  conditions  conducted 
at  each  test  level  showed  no  significant  effect  on  the  computer-managed  test 
(F  (3,104)  » .40).  However,  the  feedback  conditions  did  have  a significant 
effect  on  the  criterion  test  (F  (3,104)  “ 6.70,  £<  .01).  VRien  this  effect 
was  analyzed  by  three  a priori  planned  orthogonal  comparisons,  the  mean  cor- 
rect for  the  immediate  Feedback  condition  (2-8econd  Interval)  and  the  two 
delayed  feedback  conditions  (20-mlnute  and  24-hour  delay  Intervals)  combined 
was  significantly  greater  than  that  for  the  no  feedback  condition 
(£  (1,104)  “ 15.56,  £ < .01).  Also,  the  mean  correct  for  the  two  delayed 
feedback  conditions  combined  was  significantly  greater  than  that  for  the 
immediate  feedback  condition  (F  (1,104)  ■ 4.54,  £<  .05).  There  was  no  sig- 
nificant difference  between  the  20-mlnute  and  24-hour  delayed  feedback  con- 
ditions. 


13 


TOCKDINa  PAOB  BUnC-NOT  PIllCD 


264 


— 


o 


25- 

24- 


20- 

«9- 

18- 


o o Criterion  test 

X—  ■ — ' K Computer-managed  test 


1 ^ 1 1 1 

0 2-sec.  20-mln.  24-hr, 

No  Feedback  Feedback  Delay 


Figure  1.  Mean  number  of  "same”  multiple-choice  items  correct 
on  computer-managed  and  criterion  tests  for  each 
feedback  condition. 
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Prop<>rtlo»  Correct  on  the  Criterion  Test  That  Had  Been  Correct  or 
Incorrect  on  the  Computer-Managed  Test 

The  effect  of  feedback  conditions  on  the  proportion  of  items  that 
were  correct  on  the  criterion  test  was  analyzed  separately  for  items  that 
were  correct  and  that  were  incorrect  on  the  computer-managed  test.  Each 
student's  responses  on  the  computer-managed  test  were  divided  into  those 
that  were  correct  or  incorrect,  and  the  proportions  of  each  of  these  that 
were  correct  on  the  criterion  test  were  compared  among  the  four  feedback 
conditions. 

For  the  proportion  of  items  that  were  correct  on  both  tests,  there 
was  no  significant  difference  among  any  of  the  four  feedback  conditions  (no 
feedback  = .89;  2-second,  20-minute,  and  24-hour  delay  intervals  = .90,  .93, 
and  .93  respectively).  This  indicates  that  the  performance  of  the  control 
group,  which  received  no  informative  feedback,  did  not  differ  from  that  of 
the  groups  receiving  feedback  on  this  measure.  However,  for  the  proportion 
of  items  that  were  wrong  on  the  first  test  and  right  on  the  second  test, 
effects  of  feedback  conditions  were  similar  to  those  obtained  in  the  analysis 
of  mean  number  of  items  correct  on  the  criterion  test.  That  is,  the  propor- 
tion for  the  24-hour  delayed  feedback  condition  (.72)  did  not  differ  from 
that  for  the  20-minute  delayed  feedback  condition  (.67),  1.24;  the  pro- 

portion for  the  two  delayed  feedback  conditions  was  significantly  higher  than 
that  for  the  2-second  immediate  feedback  condition  (.53),  3.12,  £<  .01; 

and  the  proportion  for  the  immediate  feedback  condition  was  significantly 
higher  than  that  for  the  no  feedback  condition  (.39),  3.34,  £<  .01. 

Change  in  Confidence  Ratings  from  Computer-Managed  to  Criterion  Test 

As  noted  previously,  the  transformation  of  the  confidence  ratings 
given  in  the  initial  computer-managed  test  yielded  a log  prior  odds  ratio 
(LOR  pj.£qj.)»  3nd  those  given  in  the  criterion  test,  a log  posterior  odds 

ratio  (LOR  pQg^gj-ioj.^  ’ accordance  with  Bayes*  Theorem,  the  difference 

between  these  two  values  is  the  log  likelihood  ratio  (LLR) , which  should 
assess  the  impact  or  potency  of  the  feedback  condition  per  se.  Consequently, 
LLRs  were  established  for  each  criterion — the  initial  response  set  for  each 
item  (i.e.,  wrong-wrong,  wrong-right,  right-wrong,  and  right-right). 

These  LLRs  were  then  analyzed  by  an  unweighted  means  ANOVA  involving  the 
four  feedback  conditions,  crossed  with  the  four  response  classifications. 

It  is  Important  to  note  that,  since  a difference  score  was  generated,  it 
could  be  assumed  that  between-subject  differences  had  been  eliminated  and 
that  only  treatment  and  wlthin-subject  error  affected  the  ratio.  Even 
if  this  assumption  is  not  entirely  valid,  the  results  cited  here  are  con- 
servative and  should  not  lead  to  excessive  Type  1 errors. 
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The  LLR  for  each  of  the  four  feedback  conditions  and  response  clas- 
sifications is  presented  in  Table  2.  Although  there  was  a significant  effect 
of  response  classification  (F  (3,3344)  = 1342.23,  £ < .01),  this  effect  is 
meaningless  since  it  merely  reflects  the  differences  between  the  types  of 
responses  per  se;  for  example,  wrong-right  responses  are  positive  by  defini- 
tion, and  right-wrong  responses,  negative  by  definition.  There  was  a signi- 
ficant effect  of  feedback  conditions  3,3344)  = 5.58,  £ < .01),  as  well 
as  a significant  interaction  between  feedback  condition  and  response  classi- 
fication (F  (3,3344)  « 8.15,  £ < .01).  When  simple  main  effects  of  feedback 
conditions  for  each  response  classification  were  analyzed,  there  were  no 
significant  differences  among  feedback  conditions  for  wrong-wrong  responses 
(F  (3,3344)  * .84)  or  for  right-right  responses  (F  (3,3344)  = .69).  Thus, 
the  confidence  ratings  of  these  two  response  classifications  were  not  af- 
fected by  either  different  feedback  conditions  or  the  presence  or  absence 
of  feedback. 


Table  2 

Change  in  Confidence  Ratings  from  Computer-Managed  to 
Criterion  Test  (Log  Likelihood  Ratio) 


Feedback 

Conditions 

Response 

Classification 

0 

2-sec. 

20-min. 

24-hr. 

Wrong-Wrong 

-0.1971 

-0.0298 

-0.1438 

-0.0616 

Right -Right 

0.3162 

0.3384 

0.3020 

0.4542 

Wrong-Right 

1.8153 

1.6250 

2.0500 

2.4181 

Right-Wrong 

-1.5762 

-1.4716 

-2.1199 

-1.7351 

Feedback  conditions 

did  have 

a significant 

effect  for 

wrong-right 

responses  (£  (3,3344)  » 16.76,  £ < .01)  and  for  right-wrong  responses 
(F  (3,3344)  « 11.74,  £ < .01).  Thus,  for  both  types  of  responses,  a 
Newman-Keuls  analysis  was  conducted  on  the  differences  among  the  means.  For 
the  wrong-right  responses,  there  was  a significantly  greater  increase  (at 
the  .01  level)  in  confidence  ratings  for  the  24-hour  feedback  group  than  for 
each  of  the  other  three  groups,  and  the  Increase  for  the  20-mlnute  feedback 
group  was  significantly  greater  than  that  for  the  2-second  group.  The  other 
comparisons  were  not  significant.  Since  a wrong-right  response  is  one  where 
an  initially  wrong  response  is  changed  to  a correct  one,  an  Increased  change 
in  confidence  rating  is  desirable.  Thus,  in  this  study,  providing  feedback 
after  a 24-hour  delay  was  significantly  superior  to  all  other  feedback  con- 
ditions. Even  providing  feedback  after  a 20-mlnute  delay  is  superior  to 
providing  Immediate  (2-second  Interval)  feedback. 

Results  of  the  Newman-Keuls  analysis  of  the  right-wrong  responses 
showed  that  the  20-mlnute  feedback  group  had  a significantly  greater  change 
(at  the  .01  level)  in  confidence  ratings  than  each  of  the  other  three  groups. 
Since  a right-wrong  response  is  one  where  an  initially  right  response  is 

16 


changed  to  a wrong  one,  an  increase  in  the  confidence  rating  in  this  case 
is  not  desirable.  Thus,  this  analysis  indicates  that  providing  feedback 
after  a 20-niinute  delay  is  significantly  worse  in  this  respect  than  any  other 
feedback  condition.  It  is  particularly  interesting  to  note  that  no  differences 
on  this  measure  were  found  between  the  24-hour  delayed  feedback  group  and  the 
no  feedback  (control)  group. 

In  addition,  an  analysis  of  the  overall  level  of  change  in  confidence 
ratings  was  made  for  both  wrong-wrong  and  right-right  responses.  For  wrong- 
wrong  responses,  the  mean  change  in  confidence  ratings  appears  to  be  small 
(F  (3,3344)  = 3.50,  < .05);  however,  it  is  significant  statistically. 

This  indicates  that,  even  though  the  response  was  wrong  in  both  tests,  the 
student's  confidence  that  his  answer  was,  in  fact,  correct  was  significantly 
reduced;  and  this  reduction  occurred  whether  or  not  feedback  of  the  correct 
answer  was  provided. 

For  right-right  responses,  there  was  a highly  significant  increase  in 
the  overall  level  of  change  in  confidence  ratings.  Since  this  Increase  occur- 
red for  the  no  feedback  (control)  group,  as  well  as  for  those  receiving  feed- 
back, the  time  interval  seems  to  be  primarily  responsible. 

Item  Difficulty  and  Feedback 

Further,  an  analysis  was  conducted  on  the  effect  of  feedback  conditions 
on  the  mean  correct  on  the  criterion  test  as  a function  of  the  initial  dif- 
ficulty of  the  items.  The  items  were  divided  into  three  sets — of  10  items 
each — based  on  the  percentage  of  students  who  had  each  item  correct  on  the 
initial  computer-managed  test.  The  mean  correct  on  the  criterion  test  for 
each  feedback  condition  for  each  of  three  levels  of  difficulty  is  shown  in 
Figure  2.  An  unequal  n^  ANOVA  on  these  data  was  conducted  with  the  four  feed- 
back conditions  crossed  with  the  three  levels  of  item  difficulty,  and  both 
main  effects  were  significant:  feedback,  F^  (3,108)  = 7.69,  £ < .01;  and  item 
difficulty,  F (2,216)  = 58.98,  £ < .01.  An  analysis  of  the  simple  main  effects 
of  feedback  conditions  at  each  level  of  item  difficulty  indicated  that  they  had 
a significant  effect  only  on  the  most  difficult  items  (£  (3,108)  = 11.13,  £<  .01). 

Additional  a priori  planned  orthogonal  comparisons  indicated  the  same 
relationships  among  the  feedback  conditions  for  the  most  difficult  items  as 
found  above:  The  performance  of  groups  receiving  feedback  was  superior  to  that 
of  the  no  feedback  (control)  group  (F  (1,108)  = 19.23,  £ < .01);  the  performance 
of  the  two  delayed  feedback  groups  (20-mlnute  and  24-hour  delay  Intervals)  was 
superior  to  that  of  the  immediate  (2-second  delay)  feedback  group;  and  there  was 
no  significant  difference  in  the  performance  of  the  two  delayed  feedback  groups. 
This  analysis  is  at  least  partly  redundant  with  the  analysis  indicating  that  the 
effects  of  feedback  conditions  occurred  only  on  wrong-right  responses,  since 
the  most  difficult  items  selected  were  those  that  were  most  often  wrong  on 
the  computer-managed  test. 
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Figure  2.  Mean  correct  on  criterion  tests  for  three  levels  of  Item 
difficulty  for  each  feedback  condition. 
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Performance  on  Criterion  Test  Items 


"Different”  Multiple-Choice  Items 

As  indicated  previously,  the  criterion  test  included  17  multiple- 
choice  items  that  were  not  in  the  initial  computer-managed  test.  An  ANOVA 
was  performed  on  the  mean  number  correct  of  these  "different"  items,  using 
three  between-group  variables:  instructor,  blocks  (I  through  IV  for  each 
Instructor  (see  Table  1)),  and  feedback  conditions.  Results  indicated  that 
there  was  a significant  difference  between  the  two  instructors  (£  (1,80)  = 
4.09,  £ < .05),  as  well  as  a significant  effect  of  blocks  (F^  (3,80)  = 

2.96,  < .05).  Further  analysis  indicated  that  the  students  in  blocks 

I and  II  had  significantly  higher  scores  than  those  in  blocks  III  and  IV 
(F[  (1,80)  = 7.51,  < .01).  The  effect  of  feedback  conditions  was  not 

significant  (F  (3,80)  = .33). 

Sliort-Ans»>er  Items 


The  criterion  test  also  included  10  short-answer  items,  five  that 
were  adapted  from  multiple-choice  items  in  the  computer-managed  test,  and 
five  others  on  the  same  material.  Responses  to  these  items  were  scored  on 
a scale  from  0 to  4;  thus,  the  student  could  have  scores  of  20  on  the  five 
"same"  items  and  20  on  the  five  "different"  items. 

This  part  of  the  criterion  test  was  scored  as  follows: 

1.  An  Initial  judge  made  a detailed  key  for  the  test. 

2.  Two  different  judges  independently  scored  all  of  the  tests. 

3.  Discrepancies  between  the  two  scores  on  any  item  were  resolved 
by  the  Initial  judge. 

An  unequal  n ANOVA  was  conducted  on  these  data,  with  three  between-group 
variables:  instructor,  feedback  conditions,  and  blocks.  The  scores  for 
the  "same"  and  "different"  items  were  repeated  measures.  Results  showed 
that  the  mean  for  the  "same"  items  (12.65)  was  significantly  higher  than 
that  for  the  "different"  items  (10.77),  F (1,80)  • 12.21,  £ < .01;  and  that 
blocks  had  a significant  effect  (F  (3,80)  = 8.69,  < .01).  The  effect  of 

feedback  conditions  (F  (3,80)  = .97)  and  the  other  comparisons  were  not 
significant . 

Amount  of  Time  Spent  on  Computer-Managed  Test 

For  each  student,  the  length  of  time  each  item  was  exposed  on  the  computer- 
managed  test  was  recorded,  and  the  total  time  spent  on  the  30  items  was  analyzed 
by  an  unequal  n ANOVA  with  two  between-groups  variables,  feedback  conditions 
and  blocks.  There  were  no  significant  effects  on  this  measure. 

Also,  for  each  student  in  the  three  groups  receiving  feedback,  the  time 
spent  on  feedback  for  each  item  was  recorded.  An  unequal  n ANOVA  was  con- 
ducted on  the  total  time  spent  on  feedback  for  the  30  items,  with  two  between- 
groups  variables— feedback  conditions  and  blocks.  There  were  no  significant 
effects. 
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State  Anxiety  Measures 

As  indicated  in  the  METHOD  section,  the  short  form  of  the  State  Anxiety 
Scale  was  administered  at  the  following  times: 

1.  To  students  In  all  feedback  groups  before  they  took  the  computer- 
managed  test  (with  standard  instructions),  and  immediately  after  they  had 
completed  the  test  (with  retrospective  Instructions).  (At  this  time,  the 
Immediate  (Z-second  delay)  feedback  group  had  received  and  studied  their 
feedback. ) 

2.  To  students  in  the  20-minute  delayed  feedback  group  after  they  had 
received  and  studied  their  feedback  (with  retrospective  instructions). 

3.  To  students  in  all  feedback  groups  when  they  returned  to  the  computer 
testing  room  24  hours  later  to  receive  their  score  (with  standard  instructions). 

4.  To  students  in  the  24-hour  delayed  feedback  group  after  they  had 
received  and  studied  their  feedback  and  before  they  received  their  test  score 
(with  retrospective  instructions). 

3.  To  students  in  all  feedback  groups  after  they  had  completed  the 
criterion  test  (with  retrospective  instructions). 

Each  A-State  questionnaire  was  scored  by  two  Independent  judges.  On 
the  first  measure,  taken  before  the  computer-managed  test  was  administered, 
there  was  a significant  difference  among  the  feedback  groups  (F  (3,80)  « 

4.65,  £ < .01).  Although  the  three  a priori  orthogonal  comparisons  planned 
for  analysis  of  the  later  measures  were  not  directly  meaningful  to  this 
first  measure,  they  were  conducted  to  provide  a more  accurate  basis  for 
interpreting  the  later  comparisons.  Results  indicated  that  the  A-State 
scores  for  the  2-second  Imnedlate  feedback  group  were  significantly  higher 
than  the  combined  scores  for  the  20-mlnute  and  24-hour  delayed  feedback 
groups  (F  (1,80)  ■ 12.62,  £ < .01).  There  was  also  a significant  interaction 
between  blocks  and  feedback  conditions  (£  (9,80)  - 2.12,  £ < .05). 

Since  there  was  an  initial  difference  in  the  A-State  measure  among  the 
feedback  conditions,  an  analysis  was  made  of  the  difference  between  the  initial 
A-State  measure  and  the  measure  taken  after  information  feedback  had  been 
presented.  For  the  2-second  immediate  feedback  and  no  feedback  groups,  the 
difference  was  computed  between  the  initial  measure  and  the  measure  taken 
directly  after  the  computer-managed  test  had  been  completed  (see  1 above). 

For  the  20-mlnute  and  24-hour  delayed  feedback  groups,  the  difference  was 
computed  between  the  initial  measure  and  the  measure  taken  directly  after 
they  had  received  and  studied  their  feedback  (see  2 and  4 above).  An  unequal 
n ANOVA  conducted  on  these  measures  indicated  no  significant  effects. 

The  means  for  the  A-State  measure  taken  when  students  returned  to  the 
computer  testing  room  24  hours  after  they  had  taken  the  computer-managed  exam 
(see  3 above)  were  compared  to  determine  if  there  was  any  difference  between 
the  groups  %rho  had  received  feedback  on  the  day  they  had  taken  the  computer- 
managed  test  (the  2-8econd  immediate  feedback  and  the  20-mlnute  delayed 
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feedback  groups)  and  those  who  had  not  (the  no-feedback  and  the  24-hour 

foi^SJ^t  h indicated  that  the  mean  A-State  measure 

for  those  who  had  not  received  feedback  (8.30)  was  significantly  higher  than 
that  for  those  who  had  (7.24),  F (1,79)  = 4.41,  £ < .05. 

for  n ANOVA  was  conducted  on  the  A-State  measures  taken 

Reactions  Toward  Computer— Managed  Test 

Responses  to  the  question  concerning  student  reactions  toward  the  computer- 
Tulrh  -tegorized  as  positive  (41%).  negative  (23%),  or  hlZZ 

neutral  (36%) . A chi  square  analysis  indicated  that  there  were  no  significant 
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Results  of  this  study  have  confirmed  a consistent  finding  of  previous 
laboratory  experiments  with  computer-managed  tests  In  an  educational  set- 
ting: Long-term  retention  of  course  material  following  Imnedlate  Informa- 
tive feedback  Is  not  superior  to  that  with  delayed  Informative  feedback. 

As  stated  earlier.  In  previous  experiments,  the  term  "Imnedlate  feedback" 
was  used  for  conditions  where  feedback  Is  presented  Item-by-ltem  Imnedl- 
ately  (approximately  2-second  delay)  after  the  student  makes  each  response, 
and  to  conditions  where  It  Is  presented  after  the  student  has  responded  to 
an  entire  set  of  Items  (20-mlnute  delay).  In  the  present  experiment,  as 
In  previous  experiments,  retention  following  either  the  2-second  or  20- 
mlnute  feedback  conditions  was  not  superior  to  that  following  a longer 
delay.  It  Is  of  particular  Interest  that  the  present  results  occurred  In 
an  educational  setting  with  no  control  over  study  of  material  and  In  classes 
In  which  Instructional  practices  differed  In  seemingly  Important  ways. 

Present  results  Indicate  that  retention  performance  following  20-mlnute 
or  24-hour  delays  was  superior  to  that  following  a 2-second  delay.  This  was 
produced  by  the  proportion  of  Items  wrong  on  the  Initial  test  and  right  on 
the  criterion  test  (wrong-right  responses).  This  finding  Is  consistent  with 
the  perseveration-interference  Interpretation  of  the  delay  retention  effect 
(Kulhavy  & Anderson,  1972).  However,  the  fact  that  there  was  no  difference 
in  wrong-right  responses  between  the  20-mlnute  and  24-hour  delayed  feedback 
conditions  raises  questions  about  the  length  of  time  response  traces  perse- 
verate  and  Indicates  a need  for  clarification  of  this  Interpretation. 

The  present  results  also  support  the  Interpretation  that  differences  at 
retention  are  due  to  conditions  of  presentation  of  feedback  per  se  and  not 
to  Indirect  effects  such  as  increased  motivation,  studying,  etc.  Feedback 
conditions  had  no  effect  on  "different"  multiple-choice  items  in  the  criterion 
test,  for  which  no  feedback  was  provided.  Also,  groups  receiving  feedback 
did  better  on  the  criterion  test  than  the  no  feedback  (control)  group. 

The  delay  retention  effect,  however,  was  not  as  marked  in  the  present 
study  as  It  has  been  in  previous  laboratory  experiments.  There  was  no  dif- 
ference In  performance  on  the  multiple-choice  criterion  test  between  the  20- 
minute  and  24-hour  delayed  feedback  conditions,  and  feedback  Intervals — or 
even  the  presence  or  absence  of  feedback — had  no  effect  on  the  short-answer 
criterion  test.  Although  it  Is  not  clear  why  the  longer  24-hour  delay  did 
"ot  result  in  superior  retention.  It  Is  true  that  several  conditions  of  the 
iresent  experiment  "pushed  the  limits"  of  those  in  previous  experiments. 

The  initial  performance  level  prior  to  feedback  was  relatively  high  and  the 
retention  interval  for  many  students  was  longer  than  that  previously  used. 
Also,  the  reliability  of  both  the  computer-managed  and  criterion  tests  was 
relatively  low.  The  fact  that  the  delay  retention  effect  occurred  for  only 
the  items  with  the  highest  level  of  difficulty  does  suggest  that  a greater 
effect  might  occur  with  more  difficult  test  Items.  However,  it  seems  likely 
that  the  present  results,  or  lack  of  results,  on  this  measure  are  due  to 
a combination  of  factors. 

Using  the  confidence  ratings,  which  vere  included  to  provide  a more  con- 
tinuous measure  of  retention  than  the  more  traditional  measures,  retention 
following  the  longer  24-hour  delayed  feedback  was  superior  to  that  of  all 


other  feedback  conditions.  This  finding  is  consistent  with  the  interpreta- 
tion that  students  learn  more  when  they  receive  information  about  the  correct 
answer  after  a longer  delay  interval  (Suurges,  1969;  1972),  and  with  inter- 
pretations that  retention  is  a function  of  both  the  level  or  depth  of  pro- 
cessing and  the  spread  of  encoding  (e.g.,  Craik  & Tulving,  1975).  Improved 
retention  following  greater  depth  of  processing  and/or  spread  of  encoding 
has  been  reported  for  both  intentional  and  incidental  learning,  and  has 
occurred  in  situations  similar  to  that  of  the  present  study  (Craik  & Tulving, 
1975).  Also,  Seamon,  Murray,  and  Barclay  (1976)  report  that  both  confidence 
ratings  and  accuracy  are  higher  in  an  incidental  learning  recognition  test 
after  engaging  in  a semantic-orienting  task  (with  meaningful  words)  rather 
than  a structural-orienting  task. 

It  is  hypothesized  that,  after  a longer  delay  Interval,  students  engage 
in  a more  thorough  semantic  analysis  of  information  presented  at  feedback, 
which  accounts  for  the  Increased  confidence  ratings  on  the  retention  test. 
However,  the  finding  that  feedback  Intervals  had  no  effect  on  the  time  spent 
in  answering  the  items  and  in  studying  the  feedback  is  not  consistent  with 
this  hypothesis  unless  it  is  assumed  that  retention  following  informative  feed- 
back is  a function  of  how  information  is  processed,  and  that  different  kinds 
of  processing  take  about  the  same  amount  of  time.  Clearly,  more  direct  study 
of  this  problem  is  necessary. 

Previous  studies  have  reported  that  the  relative  Impact  of  immediate 
and  delayed  feedback  on  later  retention  varies  with  the  relationships  among 
the  forms  of  the  initial  item,  the  feedback  presentation,  and  the  criterion 
test  item  (Sturges,  1969;  1972).  Thus,  the  interpretations  and  implications 
of  the  findings  of  the  present  report  are  limited  since  all  items,  in  both 
tests,  were  presented  in  the  same  form  (l.e.,  with  the  same  alternatives  in 
the  same  order).  Another  limitation  of  the  present  results  is  the  fact  that 
there  was  only  one  test  session  with  informative  feedback.  No  evidence  cur- 
rently exists  on  the  relative  effectiveness  of  immediate  and  delayed  feed- 
back for  repeated  computer-managed  tests  throughout  a course. 
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RECOMMENDATIONS 


It  Is  recominended  that: 

1.  Further  research  be  conducted  to  extend  the  findings  of  the  present 
study  by  comparing  the  relative  effects  of  Immediate  and  delayed  feedback 
under  other  experimental  conditions;  for  example,  using  different  forms  of 
feedback  presentation  and/or  criterion  test  items,  and  conducting  repeated 
computer-managed  tests  with  informative  feedback  throughout  the  course. 

2.  Procedures  that  Increase  student  concentration  on  Informative  feed- 
back and  thus  presumably  Influence  the  level  or  breadth  of  processing  be 
developed  and  evaluated. 

3.  The  appropriateness  of  requiring  Inmedlate  feedback  to  be  provided 

in  curriculum  development  (NAVEDTRA  106A,  1975,  Phase  III,  Table  III. A,  p.  9) 
be  reconsidered  since  such  conditions  may  not  be  optimum  for  learning. 
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